System, Method and Engine for Playing Smil Based Multimedia Contents

ABSTRACT

A system for playing SMIL based multimedia contents, comprising: a plurality of SMIL engines for analyzing and interpreting SMIL documents, as well as communicating with and controlling SMIL sub engines, remote media proxies, or local media playing devices; a plurality of remote media proxies for receiving instructions from the upper level SMIL engines, starting or stopping providing media objects to the remote media playing devices, sending back events, and providing basic user interaction capabilities, wherein said a plurality of SMIL engines, a plurality of remote media proxies, and local and remote media playing devices construct a tree-link structure, of which the root node is a SMIL engine, the branch nodes are SMIL engines and remote media proxies, and the leaf nodes are local and remote media playing devices. The corresponding SMIL engines and methods are also provided. The present invention enables the playing of SMIL based multimedia contents on a set of PvC devices, which can be dynamically configured as a new multimedia terminal on demand.

TECHNICAL FIELD

The present invention relates to methods for playing SynchronizedMultimedia Integration Language (SMIL) based multimedia contents in thefield of computer networks, specifically, to the engine, system andmethod for playing said multimedia contents on pervasive computing (PvC)devices.

TECHNICAL BACKGROUND

SMIL is a multimedia control language defined by World Wide WebConsortium (W3C) for encoding time based multimedia presentationsdelivered over the Web, defining when, where and how to play segments ofmultimedia (such as animation, audio, video, still image, static textand text stream). With such a language, a set of independent multimediaobjects can be integrated into synchronized multimedia contents, anduser interaction can be supported during playing. SMIL may be applied inWeb TV, on-line course, multimedia presentation, etc. SMIL has beenwidely used in client side applications by companies such as:Realnetwork, Apple, Microsoft, etc.

Conventional SMIL browsers, such as Xsmiles, Apple QuickTime and Helix,can support full profile and extension of SMIL 2.0. These browsers,similar to Web browsers, are pure client software designed for PC-likemultimedia devices.

With the advent of more and more networked PvC devices, such as PersonalDigital Assistant (PDA), mobile phone, in-home and telematix device, therequirement is arisen to play SMIL based multimedia contents on suchdevices. However, a PvC device usually can only play media objects inone kind of media format due to the limitation of media objects playableon such device, so it is difficult for one PvC device to support theplaying of a plurality of media objects. For instance, a POTS telephoneonly supports audio; an IP telephone and a mobile phone may supportaudio and simple text input; a PDA may support text and still image, andeven simple video stream, based on network browsing and userinteraction; and a TV set and digital HiFi system support the playing ofreal-time video and audio stream. However, the resources of the PvCdevices for the playing media objects as described above are limited sothat only a subset of standard multimedia types defined by SMIL can beplayed. Besides, the limitation of resources makes it difficult toimplement such functions as multi-threading and complicatedtiming/synchronization mechanism and integration of a plurality of mediaon PvC devices.

There are prior methods for supporting SMIL on handheld devices. Forinstance, a 3rd Generation Mobile Communication Standard PartnershipProject (3GPP) SMIL profile used for Multimedia Messaging Service (MMS)may support the playing of SMIL based multimedia contents on a mobilephone with SMIL interpretation capability. A 3GPP SMIL profile, however,is only a subset of SMIL 2.0 basic profile and the media contents thatmay be played on a handheld device are limited. For instance, US PatentApplication 2003/0229847 entitled as “Multimedia Reproducing Apparatusand Method” discloses a method for reproducing SMIL based multimediacontents on a single mobile device. But, the application focuses onserialization of the parallel timing and synchronization mechanism, anddoes not solve the problem of resources for playing complicatedmultimedia contents and flexible user interaction. Thus, bothabove-mentioned approaches can only support the playing of SMIL basedmultimedia contents on PvC devices with SMIL interpretation capability.As a result, it is impossible to play SMIL based multimedia contents onPvC devices without SMIL interpretation capability.

SUMMARY OF THE INVENTION

The present invention seeks to solve the above technical problems. Itspurpose is to provide a SMIL engine, system and method for playing SMILbased multimedia contents on pervasive computing devices. With the SMILengine, system and method of the present invention, a set of PvC devicesmay be dynamically configured to work in collaboration with each otherfor jointly playing SMIL based multimedia contents, so as to reduce therequirement on the media interaction capabilities of PvC devices. Thus,the full profile and extension of SMIL 2.0 may be supported.

According to an aspect of the present invention, there is provided aSMIL engine for playing SMIL based multimedia contents, comprising:

a media device registry for registering media devices controlled by saidSMIL engine;

a SMIL parser for, based on the analysis of a SMIL document and theacquired information on the media interaction capabilities of the mediadevices, generating intermediate SMIL models and distributing theintermediate SMIL models to next level SMIL engines and/or remote mediaproxies and generating corresponding local proxy objects, and generatinginternal SMIL models to be deployed on a local SMIL interpreter;

a SMIL interpreter for interpreting and executing the playing logic ofthe SMIL document, triggering next level SMIL engines and/or remotemedia proxies and/or local media playing devices to play the mediacontents, and controlling interaction with a user; and

a remote event proxy for maintaining a mapping table that contains therelationships between said local proxy objects and the intermediate SMILmodels distributed to the next level SMIL engines and/or remote mediaproxies, and being responsible for event transferring between the localSMIL engines and the next level SMIL engines and/or remote mediaproxies.

Preferably, the media devices controlled by said SMIL engine comprise:next level SMIL engines, remote media proxies and local media playingdevices, all of which support a subset of SMIL defined media interactioncapabilities and register respective media interaction capabilities andlocation information in said media device registry when system startsup.

Preferably, said intermediate SMIL model comprises: a time containerthat contains multimedia contents distributed to a next level SMILengine and a media object distributed to a remote media proxy; saidinternal SMIL model is a time container executable on a local SMILinterpreter, comprising the control logic of the present level SMILmodel and the media object distributed to a local media playing device.

Preferably, said events comprise: timing events, document object modelevents, user interaction events, and internal events.

Preferably, said local media playing device comprises:

a media playing controller for driving a media player based on thetriggering mechanism of said SMIL interpreter, and acquiring the mediacontents to be played; and

a media player for playing media contents.

According to another aspect of the present invention, there is provideda system for playing SMIL based multimedia contents, comprising:

a plurality of SMIL engines as described above, for analyzing,interpreting, and executing SMIL documents, as well as communicatingwith and controlling the next level SMIL engines, remote media proxies,or local media playing devices;

a plurality of remote media proxies for receiving instructions from theupper level SMIL engines, starting or stopping providing media objectsto the remote media playing devices, sending back events, and providingbasic user interaction capabilities;

said a plurality of SMIL engines, a plurality of remote media proxies,and local and remote media playing devices construct a tree-linkstructure, of which the root node is a SMIL engine, the branch nodes areSMIL engines and remote media proxies, and the leaf nodes are local andremote media playing devices.

According to still another aspect of the present invention, there isprovided a method for playing SMIL based multimedia contents in saidsystem, comprising following steps:

analyzing, with a SMIL engine, a SMIL document and acquiring informationon the media interaction capabilities of the media devices controlled bysaid SMIL engine;

based on the acquired information on the media interaction capabilities,with said SMIL engine, generating intermediate SMIL models, distributingthe intermediate SMIL models to next level SMIL engines and/or remotemedia proxies, and generating corresponding local proxy objects; and/orgenerating internal SMIL models to be deployed on a local SMILinterpreter;

updating a mapping table, which records the relationships between saidlocal proxy objects and the intermediate SMIL models distributed to thenext level SMIL engines and the remote media proxies;

executing the above-mentioned steps recursively till the last level SMILengines with the next level SMIL engines;

interpreting respectively the received intermediate SMIL models andgenerating internal SMIL models with each said SMIL engine; and

starting up the remote media playing devices and/or local media playingdevices to play media contents according to time and events.

The present invention has following advantages: 1) the invention maydynamically configure PvC devices on demand to construct a newmultimedia terminal for playing multimedia contents; 2) the inventionmay meet the requirements for playing synchronous media contents on aset of PvC devices with limited resources; 3) compared with conventionalSMIL client application mode, the invention applies a distributed modefor better utilization of the performance of servers and intermediatenodes; 4) with the invention, PvC devices without SMIL interpretationcapability may be integrated into SMIL based applications.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the present invention will now be described, byway of example only, with reference to the accompanying drawings inwhich:

FIG. 1 is a schematic diagram of a system for playing SMIL basedmultimedia contents according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a SMIL engine according to anembodiment of the present invention;

FIG. 3 is a schematic diagram of a typical application of the systemshown in FIG. 1; and

FIG. 4 is a schematic flowchart of a method for playing SMIL basedmultimedia contents according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a schematic diagram of a system for playing SMIL basedmultimedia contents according to an embodiment of the present invention.In practice, the SMIL based contents are usually described by SMILdocuments. As shown in FIG. 1, the system comprises a plurality of SMILengines 201 and a plurality of remote media proxies 202, constructing ahierarchical structure, that is, a tree-link structure, of which theroot node is a SMIL engine (it may be called as a “SMIL root engine”)201.

The SMIL engines 201 at each level are used to analyze and interpretSMIL documents or parts of a document, as well as communicate with andcontrol the remote media proxies 202 and/or next level SMIL engines(called as “SMIL sub-engine”) 201 and/or local media playing devices.

The remote media proxies 202 are mainly used for controlling thoseremote media playing devices without SMIL interpretation capability,receiving instructions from the upper level SMIL engine(s) 201, startingor stopping providing media objects to the remote media playing devices,sending back events, and providing basic user interaction capability. Inthe media object provided to a remote media playing device by a remotemedia proxy 202, the media contents playable by the remote media playingdevice are defined, which may comprise, for instance, the address of themedia server from which the remote media playing device acquires mediacontents. If the remote media playing device can not access the mediaserver, the media contents to be played are acquired from the mediaserver by the remote media proxy 202 and provided to the remote mediaplaying device for playing.

In the present specification, the media playing devices comprise localmedia playing devices and remote media playing devices, wherein thelocal media playing device is meant a media playing device directlycontrolled by a SMIL engine, whereas the remote media playing device ismeant a media playing device controlled by a remote media proxy. Themedia playing devices may be PvC devices. A local media playing devicefurther comprises a media playing controller and a media player. Themedia playing controller drives the media player according to theinstructions from a SMIL engine, acquires media contents to be played,and then provides the media contents to the media player for playing.

After the SMIL root engine 201 has downloaded a SMIL document from aSMIL server, the SMIL document is analyzed and decomposed into smallerSMIL documents that may be in two forms, ether a media object ormultimedia contents contained in a time container. And then, thesesmaller SMIL documents are distributed to remote media proxies 202 andSMIL sub-engines 201, and/or kept locally as needed. A SMIL sub-enginecontinues to analyze received SMIL sub-document and distribute evensmaller SMIL documents to remote media proxies 202 and SMIL sub-engines201, and/or keep them locally. The above process is repeated recursivelytill the last level SMIL sub-engines 201. In this way, the SMIL documentis distributed to the related nodes of the system after decomposition.When the SMIL based multimedia contents are being played, each SMILengine interprets the received SMIL document, invokes related devicesbased on time and events, such as timing event, Document Object Model(DOM) event, etc. Then, the media playing devices will, based on thedefinitions of media objects provided by the remote media proxies 202and/or SMIL engines 201, acquire media contents from corresponding mediaservers for playing. If a media playing device cannot access a mediaserver, the media contents may be acquired by the remote media proxy 202and provided to the remote media playing device for playing.

Besides, a user may interact in various forms with remote media proxies202 or SMIL engines 201 through various media channels provided bydifferent media playing devices (such as PvC devices), the eventsproduced consequently may also control the playing of the mediacontents, thereby improving the flexibility of the playing of multimediacontents.

From the above descriptions, it can be seen that a system that appliesthe present embodiment may play SMIL based multimedia contents on a setof PvC devices with limited resources, which can be dynamicallyconfigured on demand, thereby constructing a new multimedia terminalthat meets different requirements and further implements userinteractions in various forms.

FIG. 2 is a schematic diagram of an embodiment of the SMIL engines inFIG. 1. As shown in FIG. 2, the SMIL engine 201 comprises a media deviceregistry 301, a SMIL parser 302, a SMIL interpreter 303 and a remoteevent proxy 304. Each above-mentioned module will be described in detailas follows.

The media device registry 301 is used for registering media devicescontrolled by the SMIL engine 201, comprising: local media playingdevices that may locally play media contents defined by the SMILdocument; remote media proxies that, according to the instructions fromthe SMIL engine 201, control the remote media playing devices to playmedia contents defined by SMIL media objects; next level SMIL enginesthat interpret parts of the SMIL document according to the control. Allof above-mentioned three kinds of media devices support a subset of themedia interaction capabilities defined by SMIL. When the system startsup, the interaction capabilities and location information of such mediadevices may be registered into the media device registry 301 manually orautomatically.

When it is requested to play a SMIL document, the SMIL parser 302downloads a corresponding SMIL document from the SMIL server, analyzesthe SMIL document, understands the contents of the SMIL document, andthen searches for media devices with proper media interactioncapabilities in the media device registry 301. According to theinformation on the media devices, intermediate SMIL models and/orinternal SMIL models are generated, comprising static information of theSMIL objects. The intermediate SMIL model and internal SMIL model may bein two forms: media object and time container that contains multimediacontents, the difference between which is that an intermediate SMILmodel is a text or serialized model that can be transferred betweendifferent nodes on a network, while an internal SMIL model is anexecutable object model that can be executed on the SMIL interpreter.According to predefined binding rules and the information provided bythe media device registry 301, the media objects in the intermediateSMIL model are distributed to the remote media proxies 202 and/or localmedia playing devices, while the time containers in the intermediateSMIL model are distributed to the SMIL sub-engines 201. The internalSMIL model is deployed on the SMIL interpreter 303 that will beillustrated as below, and further controls lower level SMIL sub-engines,media proxies, and local media playing devices. The binding rules definethe relationship between the intermediate SMIL model and/or internalSMIL model at the SMIL engine and the media devices controlled by theSMIL engine, which may be default or predefined. Once a time containeris distributed to the SMIL engine 201, a local proxy object of the timecontainer is generated and transferred to the SMIL interpreter 303; oncea media object is distributed to a remote media proxy 202, a local proxyobject of the media object is generated and also transferred to the SMILinterpreter 303. The SMIL parser 302 exchanges events with the sub SMILengine 201 and the remote media proxy 202 to which the time containerand media object are distributed, through a remote event proxy 304 thatwill be illustrated as below.

The SMIL interpreter 303 mainly interprets the playing logic of SMILdocuments, triggers the playing of media contents according to time andevents, invokes corresponding remote media proxies 202 and/or SMILsub-engines 201 and/or local media playing devices, and controlsinteraction with the user.

A mapping table, which is maintained in the remote event proxy 304,contains the relationship between the proxy objects in the local SMILinterpreter 303 and the media objects distributed to the remote mediaproxies 202 and time containers distributed to the SMIL sub-engines 201.The remote event proxy 304 is responsible for transferring serializedevents, comprising SMIL timing events, DOM events etc., through whichthe playing of media contents may be controlled, between the local SMILengines 201 and the SMIL sub-engines 201, remote media proxies 202.

From the above description, it can be seen that the SMIL engine applyingthe present embodiment can analyze and interpret SMIL based multimediacontents, and distribute the generated media objects and time containersto the local media playing devices, remote media proxies and SMILsub-engines. Through the recursive analysis of each SMIL engine, theSMIL based multimedia contents are distributed to the nodes of thesystem. And, when being played, with each SMIL engine's recursiveinterpretation of the media objects and time containers, the mediacontents are played on the media playing devices.

FIG. 3 is a schematic diagram of a typical application of the system ofthe present invention, which combines a telephone, a TV set, a HiFisystem and a PDA for playing SMIL based multimedia contents, herein thetelephone, TV set, HiFi system and PDA are all resource limited PvCdevices. The system shown in FIG. 3 is a two-level distributedstructure, wherein the first level is a SMIL root engine, and the secondlevel comprises a PDA with a SMIL engine (that is, a SMIL engine thatcontrols a local playing device—PDA), a set-top box attached to the TVset and the HiFi system (that is, a remote media proxy), and a telephone(a local media playing device of the SMIL root engine). The SMILdocument is stored on the SMIL server. The locations and the supportedmedia interaction capabilities of the telephone, TV set, HiFi system,PDA and set-top box are registered in the SMIL root engine. The mediaservers are used to store various media contents.

The SMIL document is downloaded to the SMIL root engine as requested.After the SMIL root engine analyzes the SMIL document, based on thebinding rules and information of registered media devices, stream mediaobjects for video and audio are distributed to the set-top box; timecontainers that contain text interaction and text stream are distributedto the PDA with a SMIL engine; and media objects for speech interactionare distributed to the telephone through the local SMIL interpreter.Then the PDA with a SMIL engine further analyzes the time containers anddistributes the media objects for text stream to the local media device(PDA) through a local SMIL interpreter for executing. In this way, theSMIL document requested to be played is decomposed to various nodes ofthe system.

Then, the SMIL root engine starts up a main timer, invokes relateddevices according to time and events, and triggers the playing of SMILbased multimedia contents. If it is needed to play media contents ofaudio and video, related events are sent to the set-top box to triggerthe playing. Due to the fact that the HiFi system cannot access themedia server 2, the set-top box acquires media contents to be playedfrom the media server 2 based on the definition of the audio mediaobject, and then provides such media contents to the HiFi system forplaying (as shown by the dashed lines); and for the TV set, after theset-top box provides corresponding video media object, the TV setacquires the media contents to be played from the corresponding mediaserver 1 (as shown by the dashed line). If there is a need for textinteraction and playing of media contents of a text stream, relatedevents are sent to the PDA with a SMIL engine, which interprets the timecontainer, acquires text stream from the corresponding media server 2for playing based on the definitions of the text stream media object,and generates corresponding text interaction events. If there is a needfor speech interaction, the SMIL root engine invokes the telephone toplay, and the telephone acquires media contents to be played from thecorresponding media server 1 through the telephone network, based on thedefinitions of the speech media object, and generates correspondingspeech interaction events. It is certain that the events generated bythe interaction between the user and the remote media proxy or SMILengine may also start or stop playing media contents. When the maintimer expires, the playing of media contents is finished.

FIG. 4 is a schematic flowchart of a method for playing SMIL basedmultimedia contents according to an embodiment of the present invention.The method may comprise two main steps: Step 400 for recursiveconfiguration of SMIL models at respective nodes and Step 410 for theinterpretation of the distributed SMIL model.

Before a SMIL document is played, it is needed for each SMIL engine toestablish a mapping table, which is used for registering the mediainformation, interaction capabilities and location information of themedia devices, such as local media playing devices, remote mediaproxies, and SMIL sub engines, controlled by the SMIL engine.

At Step 400, when a user invokes a presentation of SMIL based multimediacontents through any interaction channel, first at Step 402, a SMILparser in the SMIL engine acquires and analyzes a SMIL document,searches for the capability table stored in the media device registry ofthe SMIL engine, and acquires information of the controlled mediadevices. Then, at Step 403, the SMIL engine, based on the aboveinformation, generates internal SMIL models deployed on the local SMILinterpreter and intermediate SMIL models distributed to the remote mediaproxies and SMIL sub-engines (Steps 404 to 406). Usually, a media objectis bound to a local media playing device or a remote media proxy, whilea time container is bound to a SMIL sub-engine. For a media objectdistributed to a remote media proxy and a time container distributed toa SMIL sub-engine, corresponding local proxy object is generated. Then,at Step 407, the mapping table stored in the remote event proxy of theSMIL engine is updated, which records the relationships between thelocal proxy objects and the media objects and time containersdistributed to the remote media proxies and SMIL sub-engines. At Steps408 and 409, the SMIL parser distributes the media objects to the remotemedia proxies and distributes the time containers to the SMILsub-engines. The SMIL sub-engine proceeds with the above process andfinally the SMIL document is recursively configured to various nodes.

After the SMIL document is successfully configured to the whole system,Step 410 begins to execute for interpreting the distributed SMIL model.First, at Step 412, the SMIL interpreter of each SMIL engine interpretsrespective intermediate SMIL model and local SMIL model. In the presentembodiment, a main timer is set in the SMIL root engine for timing theplaying of the SMIL document. A user interaction approach may also beused to start and stop playing media contents through events. At Step413, the SMIL root engine starts up the main timer for timing. Each SMILengine, after the interpretation of respective SMIL models, may informthe local media playing devices to play the media contents according totiming or interactive events (Step 414), so that the local media playingdevices, after acquiring the media contents to be played, play the mediacontents based on the definitions of the media object; and send eventsto the remote devices through remote event proxies to invoke variousrelated devices, such as SMIL sub-engines or remote media proxies, atStep 415, to control the operations of the SMIL sub-engines and remotemedia proxies so that the remote media playing devices playcorresponding media contents. When a remote media proxy's operation isinvoked to play media contents, the remote media proxy determineswhether it is needed to acquire media contents to be played based on thecapabilities of the controlled remote media playing devices; if so, themedia contents, after being acquired, are provided to the remote mediaplaying devices for playing; if not, corresponding media objects areprovided to the remote media playing devices, which acquire and play themedia contents to be played based on the definitions of the mediaobject. At Step 416, when the main timer expires, the playing of mediacontents is finished and the execution of the SMIL model is terminated.

From the above description, it can be understood that by using themethod of the present embodiment, it is possible to implement theplaying of SMIL based multimedia contents on a set of PvC devices withlimited resources, which can be dynamically configured as a newmultimedia terminal.

Although the system, method and engine for playing SMIL based multimediacontents of the present invention have been described in detail throughsome exemplary embodiments, the above-mentioned embodiments are notexhaustive. Those skilled in the art may implement various changes andmodifications within the spirit and scope of the present invention.Therefore, the present invention is not limited to these embodiments;the scope of the invention should only be defined by the appended claimsherein.

1. A SMIL engine for playing SMIL based multimedia contents, comprising:a media device registry for registering media devices controlled by saidSMIL engine; a SMIL parser for, based on the analysis of a SMIL documentand the acquired information on the media interaction capabilities ofthe media devices, generating intermediate SMIL models, distributing theintermediate SMIL models to next level SMIL engines and/or remote mediaproxies and generating corresponding local proxy objects, and generatinginternal SMIL models to be deployed on a local SMIL interpreter; a SMILinterpreter for interpreting and executing the playing logic of the SMILdocument, triggering next level SMIL engines and/or remote media proxiesand/or local media playing devices to play the media contents, andcontrolling interaction with a user; and a remote event proxy formaintaining a mapping table that contains the relationships between saidlocal proxy objects and the intermediate SMIL models distributed to thenext level SMIL engines and/or remote media proxies, and beingresponsible for event transferring between the local SMIL engines andthe next level SMIL engines and/or the remote media proxies.
 2. The SMILengine according to claim 1, wherein the media devices controlled bysaid SMIL engine comprises: next level SMIL engines, remote mediaproxies and local media playing devices, all of which support a subsetof SMIL defined media interaction capabilities and register respectivemedia interaction capabilities and location information in said mediadevice registry when system starts up.
 3. The SMIL engine according toclaim 1, wherein said intermediate SMIL model comprises a time containerthat contains multimedia contents distributed to a next level SMILengine and a media object distributed to a remote media proxy; and saidinternal SMIL model is a time container executable on a local SMILinterpreter, comprising the control logic of the present level SMILmodel and the media object distributed to a local media playing device.4. The SMIL engine according to claim 1, wherein said local mediaplaying device comprises: a media playing controller for driving a mediaplayer according to the triggering mechanism of said SMIL interpreter,and acquiring media contents to be played; and a media player forplaying media contents.
 5. The SMIL engine according to claim 1, whereinsaid events comprise: timing events, document object model events, userinteraction events, and internal events.
 6. The SMIL engine according toany one of claims 1 to 5, wherein said local media playing devices maybe pervasive computing devices.
 7. A system for playing SMIL basedmultimedia contents, comprising: a plurality of SMIL engines accordingto any one of claims 1 to 6, for analyzing and interpreting SMILdocuments, as well as communicating with and controlling next level SMILengines, remote media proxies, or local media playing devices; aplurality of remote media proxies for receiving instructions from theupper level SMIL engines, starting or stopping providing media objectsto the remote media playing devices, sending back events, and providingbasic user interaction capabilities, wherein said a plurality of SMILengines, a plurality of remote media proxies, and local and remote mediaplaying devices construct a tree-link structure, of which the root nodeis a SMIL engine, the branch nodes are SMIL engines and remote mediaproxies, and the leaf nodes are local media playing devices and remotemedia playing devices.
 8. The system according to claim 7, wherein saidremote media proxy determines which media objects to be providedaccording to the capabilities of the remote media playing devices: if aremote media playing device cannot access any media server, the providedmedia object contains media contents to be played; and if the remoteplaying device can access a media server, the provided media objectcontains the address of the media server for acquiring media contents.9. The system according to claims 7 or 8, wherein said local mediaplaying devices and remote media playing devices may be pervasivecomputing devices.
 10. A method for playing SMIL based multimediacontents in a system defined in any one of claims 7 to 9, comprisingfollowing steps: analyzing, with a SMIL engine, a SMIL document andacquiring information on the media interaction capabilities of the mediadevices controlled by said SMIL engine; based on the acquiredinformation on the media interaction capabilities, with said SMILengine, generating intermediate SMIL models, distributing theintermediate SMIL models to next level SMIL engines and/or remote mediaproxies and generating corresponding local proxy objects; and/orgenerating internal SMIL models to be deployed on a local SMILinterpreter; updating a mapping table, which records the relationshipsbetween said local proxy objects and the intermediate SMIL modelsdistributed to the next level SMIL engine and the remote media proxies;proceeding with above-mentioned steps recursively till the last levelSMIL engines with the next level SMIL engines; interpreting respectivelythe received intermediate SMIL models and generating internal SMILmodels with each said SMIL engine; and starting up the remote mediaplaying devices and/or local media playing devices to play mediacontents according to time and events.
 11. The method according to claim10, wherein the media devices controlled by said SMIL engine comprises:next level SMIL engines, remote media proxies, and local media playingdevices, all of which support a subset of SMIL defined media interactioncapabilities.
 12. The method according to claim 10, wherein saidintermediate SMIL model comprises a time container that containsmultimedia contents distributed to a next level SMIL engine, and a mediaobject distributed to a remote media proxy; said internal SMIL model isa time container executable locally, comprising the control logic of thepresent level SMIL model and the media object distributed to a localmedia playing device.
 13. The method according to claim 10, wherein saidevents comprise: timing events, document object model events, userinteraction events, and internal events.
 14. The method according toclaim 10, wherein said step for starting up local media playing devicesto play media contents comprises: the local media playing devicesacquire media contents from corresponding media servers based on thedefinitions of the received media objects and play the media contents.15. The method according to claim 10, wherein said step for starting upremote media playing devices to play media contents comprises: theremote media proxies determine whether it is needed to acquire mediacontents to be played according to the capabilities of the controlledremote media playing devices; if so, the media contents to be played areacquired by the remote media proxies and provided to the remote mediaplaying devices for playing; if not, corresponding media links ordescriptors are provided to the remote media playing devices, whichacquire and play media contents to be played.
 16. The method accordingto any one of claims 10 to 15, wherein said local media playing devicesand remote media playing devices may be pervasive computing devices.